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1. Introduction 



Computational complexity studies the intrinsic difficulty of mathematically posed 
problems and seeks optimal means for their solutions. This is a rich and diverse 
field; for the purpose of this paper we present a greatly simplified picture. 

Computational complexity may be divided into two branches, discrete and con- 
tinuous. Discrete computational complexity studies problems such as graph the- 
oretic, routing, and discrete optimization; see, for example, Carey and Johnson 
[79] . Continuous computational complexity studies problems such as ordinary and 
partial differential equations, multivariate integration, matrix multiplication, and 
systems of polynomial equations. Discrete computational complexity often uses the 
Turing machine model whereas continuous computational complexity tends to use 
the real number model. 

Continuous computational complexity may again be split into two branches. The 
first deals with problems for which the information is complete. Problems where 
the information may be complete are those for which the input is specified by a 
finite number of parameters. Examples include linear algebraic systems, matrix 
multiplication, and systems of polynomial equations. Recently, Blum, Shub and 
Smale [89] obtained the first NP-completeness result over the reals for a problem 
with complete information. 

The other branch of continuous computational complexity is information-based 
complexity, which is denoted for brevity as IBC. Typically, IBC studies infinite- 
dimensional problems. These are problems where either the input or the output 
are elements of infinite-dimensional spaces. Since digital computers can handle 
only finite sets of numbers, infinite-dimensional objects such as functions on the 
reals must be replaced by finite sets of numbers. Thus, complete information is not 
available about such objects. Only partial information is available when solving 
an infinite-dimensional problem on a digital computer. Typically, information is 
contaminated with errors such as round-off error, measurement error, and human 
error. Thus, the available information is partial and/or contaminated. 

We want to emphasize this point for it is central to IBC. Since only partial and/or 
contaminated information is available, we can solve the original problem only ap- 
proximately. The goal of IBC is to compute such an approximation as inexpensively 
as possible. 
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Figure 1 

In Figure 1 (see p. 30) we schematize the structure of computational complexity 
described above. 

Research in the spirit of IBC was initiated in the Soviet Union by Kolmogorov 
in the late 1940s. Nikolskij [50], then a graduate student of Kolmogorov, studied 
optimal quadrature. This line of research was greatly advanced by Bakhvalov; see, 
e.g., Bakhvalov [59, 64, 71]. In the United States research in the spirit of IBC 
was initiated by Sard [49] and Kiefer [53]. Kiefer reported the results of his 1948 
MIT Master's Thesis that Fibonacci sampling is optimal when approximating the 
maximum of a unimodal function. Sard studied optimal quadrature. Golomb and 
Weinberger [59] studied optimal approximation of linear functionals. Schoenberg 
[64] realized the close connection between splines and algorithms optimal in the 
sense of Sard. 

IBC is formulated as an abstract theory and it has applications in numerous 
areas. The reader may consult TWW [88]^ for some of the applications. IBC has 
benefitted from research in many fields. Influential have been questions, concepts, 
and results from complexity theory, algorithmic analysis, applied mathematics, nu- 
merical analysis, statistics, and the theory of approximation (particularly the work 
on n-widths and splines). 

In this paper we discuss, in particular, IBC research for two problems of numer- 
ical analysis. We first contrast IBC and numerical analysis, limiting ourselves to 
just one characteristic of each. 

IBC is a branch of computational complexity, and optimal (or almost optimal) 
information and algorithms are obtained from the theory. In numerical analysis, 
particular classes of algorithms arc carefully analyzed to sec if they satisfy certain 
criteria such as convergence, error bounds, efficiency, and stability. 

Numerical analysis and IBC have different views on the problems which lie in 
their common domain. The authors of this paper have worked in both numerical 
analysis and IBC, and believe the viewpoints are not right or wrong, just different. 

On the other hand, in many research groups around the world, people work on 
both numerical analysis and IBC, and do not draw a sharp distinction between the 
two. They believe IBC can serve as part of the theoretical foundation of numerical 
analysis. 

We believe there might be some profit in discussing the views of numerical anal- 
ysis and IBC. Unfortunately Parlett [92]^ does not serve this purpose since, as 
we shall show, this paper ignores relevant literature and is mistaken on issues of 
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complexity theory. 

For example, P [92] contains a central misconception about IBC which immedi- 
ately invalidates large portions of the paper. P [92] assumes that the information 
is specified (or fixed). Indeed, the first "high level criticism" is that IBC "is not 
complexity theory" (see P [92, 2. A]), since "specified information" is used. 

But it is the very essence of IBC that both the information and the algorithms 
are varied. Indeed, one of the central problems of IBC is the optimal choice of 
information. Significant portions of three monographs, TW [80] and TWW [83, 
88], all of which are cited in P [92], are devoted to this issue. We return to this 
issue in §3 after notation has been established. 

In P [92] , the author limits himself to "matrix computations, which is the area 
we understand best." We do not object to discussing matrix computations, al- 
though they constitute a small fraction and are atypical of IBC. For example, in 
the recent monograph TWW [88], some ten pages, just 2%, are devoted to matrix 
computations. Matrix computations are atypical since complete information can 
be obtained at finite cost. However, even in this particular area, P [92] ignores 
relevant literature and does not exhibit a grasp of the complexity issues. Since the 
discussion will, of necessity, assume some rather technical details concerning matrix 
computations, we will defer it to §§5 and 6. 

We stress that we are not questioning the importance of matrix computations. 
On the contrary, they play a central role in scientific computation. Furthermore, 
we believe there are some nice results and deep open questions regarding matrix 
computations in IBC. 

But the real issue is, after all, IBC in its entirety. P [92] is merely using the two 
papers TW [84] and Kuczynski [86] on matrix computations to criticize all of IBC. 
We therefore respond to general criticisms in §§3 and 4. 

To make this paper self-contained we briefly summarize the basic concepts of 
IBC in §2. Section 7 deals with possible refinements of IBC. A summary of our 
rebuttal to criticisms in P [92] is presented in §8. 

2. Outline of IBC 

In this section we introduce the basic concepts of IBC and define the notation 
which will be used for the remainder of this paper. We illustrate the concepts 
with the example of multivariate integration, a typical application of IBC. A more 
detailed account may be found in TWW [88]. Expository material may be found 
in W [85], PT [87], PW [87], and TW [91]. Let 

S-.F^G, 

where F is a subset of a linear space and G is a normed linear space. We wish to 
compute an approximation to S{f ) for all / from F. 

Typically, / is an element from an infinite-dimensional space and it cannot be 
represented on a digital computer. We therefore assume that only partial informa- 
tion^ about / is available. We gather this partial information about / by computing 
information operations L[f), where L G A. Here the class A denotes a collection 
of information operations that may be computed. We illustrate these concepts by 
an example. 
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Example: Multivariate integration. Let F he a, unit ball of the Sobolev class 
Wp''^ of real functions defined on the d-dimensional cube D = [0, 1]'' whose rth 
distributional derivatives exist and are bounded in Lp norm. Let G = R and 

sif) = [ mdt. 

Jd 

Assume pr > d. To approximate S{f), we assume we can compute only function 
values. That is, the class A is a collection of i: F — > M, such that for some x from 
D,L{f) = fix),yf€F. □ 

For each / e F, we compute a number of information operations from the class 
A. Let 

N{f ) = ), L2{f ),..., )], LiG A, 

be the computed information about /. Wc stress that the Li as well as the num- 
ber n can be chosen adaptively. That is, the choice of may depend on the 
already computed ), L2 (/),... , Li_i(/ ). The number n may also depend on 
the computed Lj(/). (This permits arbitrary termination criteria.) 

N{f ) is called the information about /, and N the information operator. In 
general, N is many-to-one, and that is why it is impossible to recover the element /, 
knowing y = N{f ) for / € F. For this reason, the information N is called partial. 

Having computed N{f ), we approximate S{f ) by an element U{f ) = (j>{N{f )), 
where 0: N{F) ^ G. A mapping <j> is called an algorithm. 

The definition of error of the approximation U depends on the setting. We 
restrict ourselves here to only two settings. In the worst case setting 

e(C/) = sup||5(/)-[/(/)||, 
and in the average case setting, given a probability measure /x on F, 

eiU)=(^ljS{f)-U{f)ff,{df) 
Example (continued). The information is given by 

N{f) = [f{x,),f{X2),...,f{Xr,)] 

with the points Xi and the number n adaptively chosen. An example of an algorithm 
is a linear algorithm given by U{f ) = <p{N{f )) = J27=i fi^i) some numbers 
a,. 

In the worst case setting, the error is defined as the maximal distance \S{f) — 

U{f)\ in the set F. In the average case setting, the error is the L2 mean of \S(f ) — 
U{f)\ with respect to the probability measure /x. The measure fi is sometimes 
taken as a truncated Gaussian measure. □ 

To define the computational complexity we need a model of computation. It is 

defined by two assumptions: 

(1) We are charged for each information operation. That is, for every L G A 
and for every f G F, the computation of L{f) costs c, where c is positive 
and fixed, independent of L and /. 
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(2) Let fl denote the set of permissible combinatory operations including the 
addition of two elements in G, multiplication by a scalar in G, arithmetic 
operations, comparison of real numbers, and evaluations of certain elemen- 
tary functions. We assume that each combinatory operation is performed 

exactly with unit cost. 

In particular, this means that we use the real number model, where we can 
perform operations on real numbers exactly and at unit cost. Modulo roundoffs 
and the very important concept of numerical stability, this corresponds to floating 
point arithmetic widely used for solving scientific computational problems. 

We now define the cost of the approximations. Let cost (A'', / ) denote the cost of 
computing the information N{f). Note that cost(iV, /) > cn, and the inequality 
may occur since adaptive selection of Li and n may require some combinatory 
operations. 

Knowing y = N{f), we compute U{f) = (t>{y) by combining the information 
Li{f). Let cost(0, y) denote the number of combinatory operations from Q. needed 
to compute (jiiy). We stress that cost{N,f) or cost(0, y) may be equal to infinity 
if A''(/) or (j){y) use an operation outside O or infinitely many operations from A 
or Q, respectively. 

The cost of computing U{f), cost(?7, /), is given by 

cost(C/,/) = cost(iV,/) + cost(0,iV(/)). 

Depending on the setting, the cost of U is defined as follows. In the worst case 
setting 

cost(?7) = sup cost{U,f), 
feF 

and in the average case setting 

cost([/) = y cost{U,f)iJ,{df). 

We are ready to define the basic notion of e-complexity. The £-complexity is 
defined as the minimal cost among all U with error at most e, 

comp(£) = inf{cost(C/) : U such that e{U) < e}. 

(Here we use the convention that the infimum of the empty set is taken to be 
infinity.) Depending on the setting, this defines the worst case or average case 
e-complexity. 

We stress that we take the infimum over all possible U for which the error does 
not exceed e. Since U can be identified with the pair {N,(j)), where A'' is the 
information and ^ is the algorithm that uses that information, this means that 
we take the infimum over all information N consisting of information operations 
from the class A, and over all algorithms (j) that use N such that (TV, </>) computes 
approximations with error at most e. 

Remark. The complexity depends on the set A of permissible information oper- 
ations and on the set Q of permissible combinatory operations. Both sets are 
necessary to define the complexity of a problem. This is beneficial because the 
dependence of complexity on A and O enriches the theory; it enables us to study 
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the power of specified information or combinatory operations. We illustrate the 
role of A and f2 by a number of examples. 

We begin with the role of A. Assume that F is a subset of a linear space of 
functions. Let Ai consist of all linear functionals, and let A2 consist of function 
evaluations. For many applications A2 is more practical. Let il be defined as above. 

Consider the integration example. For this problem, Ai is not a reasonable 
choice since any integral could be computed exactly with cost c. For A2, we get the 
multivariate integration problem discussed in this section. 

Consider next the approximate solution of 2rnth-order elliptic linear partial dif- 
ferential equations whose right-hand side belongs to the unit ball of H'^{D) for a 
bounded simply-connected C°° region D of IR''. Let G ~ H"^{D). Werschulz has 
shown that the worst case complexity in the class Ai is proportional to £-<^/('"+'")^ 
and in the class A2 it is proportional to thorough study of this subject may 

be found in the research monograph Werschulz [91]. Thus, the complexity penalty 
for using A2 rather than Ai goes to infinity as e goes to zero for m > 0; see also 
TWW [88; Chapter 5, Theorem 5.9]. On the other hand, Werschulz has shown 
that the complexity of Fredholm integral equations of the second kind is roughly 
the same for Ai and A2; see Werschulz [91] as well as TWW [88, Chapter 5, §6]. 

We now illustrate the role of for the approximate solution of scalar complex 
polynomial equations of degree d using complete information, i.e., A consists of 
the identity mapping. Let Oi consist of the four arithmetic operations (over the 
complex field), and let ^2 consist of the four arithmetic operations and complex 
conjugation. We confine ourselves to purely iterative algorithms. Then for d > 4, 
McMuUen [85] proved that the problem cannot be solved for the class fli , whereas 
Shub and Smalc [86] proved that the problem can be solved for the class The 
positive result of Shub and Smale [86] also holds for systems of complex multi- 
variate polynomials of degree d. Hence, the arithmetic operations are too weak for 
approximate polynomial zero finding, whereas also permitting complex conjugation 
supplies enough power to solve the problem. □ 

Example (continued). For the integration problem, the model of computation 
states that one function evaluation costs c, and each arithmetic operation, com- 
parisons of real numbers, and evaluations of certain elementary functions can be 
performed exactly at unit cost. Usually c 1. 

The worst case £-complexity for the unit ball of Wp"^ is as follows. For pr > d, 

comp(£) = 0(ce~''/'') as £ 0; 

see Novak [88] for a recent survey. Take p = +00. Then for d large relative to r, the 
worst case e-complexity is huge even for moderate e. Furthermore, if only continuity 
of functions is assumed, then the problem cannot be solved since comp(£) = -|-oo. 

For the average case setting, let F be the unit ball in the sup norm of continuous 
functions. Let /j. be a. truncated classical Wiener sheet measure; see, e.g., TWW 
[88, p. 218]. Then using results from number theory concerning discrepancy (see 
Roth [54, 80]), we have 

comp(£) = e(c£-^(log£-^)('^-^)/2) as £ ^ 0; 

see W [87, 91]. Thus, the average case complexity depends only mildly on the 
dimension d. (The same 9 result holds if the unit ball is replaced by the entire 
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space of continuous functions.) To get an approximation with cost proportional to 
coinp(e), it is enough to compute the arithmetic mean X^"=i f{xi), where n = 
6(e^^(loge^^)('^^^'/^), and the points Xi are derived from Hammersley points. □ 

A goal of IBC is to find or estimate the £-complexity, and to find an e-complexity 
optimal U, or equivalently, an e-complexity optimal pair (N,^). By e-complexity 
optimality of U we mean that the error of U is at most e and the cost of U is eqiial 
to, or not much greater than, the £-complexity. For a number of problems this goal 
has been achieved due to the work of many researchers. 

Many computational problems can be formulated using the approach outlined 
above. For some problems, including the two matrix computation problems dis- 
cussed in P [92], we need a more general formulation. We now briefly discuss this 
more general formulation; details can be found in TWW [83, 88]. 

Let F and G be given sets, and W he a, given mapping 

W: Fx [0,+oo) ^ 2°. 

We assume that W{f, 0) is nonempty and grows as s increases, i.e., for any £i < S2 
we have W{f,si) C W(/,£2), V/ e F. 

We now wish to compute an element U{f) which belongs to W{f,e) for all 
f G F. The definitions of U as well as the cost of U are unchanged. The error of 
U is now defined as follows. The error of U for / from F is 

e(C/,/) = inf{ry: U{f ) e W{f,v)}. 

Then the error of U is defined as e{U) = sup f^p e{U, / ) in the worst case setting, 
and e{U) = {j'p e^{U, f ) ii{df ) )^/-^ in the average case setting. Note that for 

W{f,e) = {g€G: \\S{f)-g\\<e} 

we have e{U, f ) = ) — U{f)\\ and the two formulations coincide. 

Finally, we illustrate how the two matrix computation problems fit in this for- 
mulation. 

(i) Large linear systems. We wish to approximate the solution of a large linear 
system Az = b by computing a vector x with residual at most e, || At — 6|| < e. Here, 
6 is a given vector, = 1, and A belongs to a class F of n x n nonsingular 
matrices. The vectors x are computed by using matrix- vector multiplications Az 
for any vector z. 

This problem corresponds to taking G = M" and 

W{A,e) = {xeG: \\Ax-b\\<e}, \/A e F. 

The class A of information operations is now given by 

A = {L : F ^ R" : there exists a vector zeW 

such that L{A) = Az, VA e F}. 
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(ii) Eigenvalue problem. 

For a matrix A from a class F otnxn symmetric matrices, we wish to compute 
an approximate eigenpair {x, A), where a; e M" with — 1, and A e K, such that 

\\Ax-Xx\\ < s\\A\\. 

As in (i), the pairs (a;, A) are computed by using matrix- vector multiplications. 

This problem corresponds to taking G = B" x R, where -B" is the unit sphere 
of K", and 

W{A,s) = {{x,X)€G: \\Ax-Xx\\<e\\A\\}, \/A€F. 
The class A is the same as in (i). 

3. The role of information 

Information is central to IBC. We indicate briefly why the distinction between 
information and algorithm is so powerful. We then respond to two general criticisms 

in P [92] regarding information. 

As explained in §2, the approximation U{f) is computed by combining informa- 
tion operations from the class A. Let y = N{f ) denote this computed information. 

In general, the operator N is many-to-one, and therefore the set N~^{y) consists 
of many elements of F that cannot be distinguished from / using N. Then the set 
SN~^{y) consists of all elements from G which arc indistinguishable from S{f). 
Since U{f) is the same for any / from the set N^^{y), the clement U{f) must 
serve as an approximation to any element g from the set SN^^{y). It is clear that 
the quality of the approximation U(f) depends on the "size" of the set SN~^{y). 
In the worst case setting, define the radius of information r{N) as the maximal 
radius of the set SN^^{y) for y e N{F). (The radius of the set A is the radius of 
the smallest ball which contains the set A.) 

Clearly, the radius of information r{N) is a sharp lower bound on the worst case 
error of any U. We can guarantee an e-approximation iff r(N) docs not exceed e 
(modulo a technical assumption that the corresponding infimum is attained) . 

The cost of computing N{f ) is at least cn, where c stands for the cost of one 
information operation, and n denotes their number in the information N. By the e- 
cardinality number m{e) we mean the minimal number n of information operations 
for which the information N has radius r{N) at most equal to e. From this we get 
a lower bound on the e-complexity in the worst case setting, 

comp(e) > cm{e). 

For some problems (see TWW [88, Chapter 5, §5.8]) it turns out that it is possible 
to find an information operator Ng consisting of m(e) information operations, and 
a mapping (p^ such that the approximation U{f ) = (f>^{N^{f j) has error at most e 
and U{f) can be computed with cost at most (c + 2) m{e). This yields an upper 
bound on the e-complexity, 

comp(e) < (c+2)m(e). 

Since usually c ^ 1, the last two inequalities yield the almost exact value of the 
£-complexity, 

comp(e) ~ cm{e). 
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This also shows that the pair {N^,(t>^) is almost e-complexity optimal. 

In each setting of IBC one can define a radius of information such that we can 
guarantee an e-approximation iff r{N) does not exceed s; see TWW [88]. This 
permits one to obtain complexity bounds in other settings. 

What is the essence of this approach? The point is that the radius of information 
as well as the e-cardinality number m(e) and the information A^^ do not depend 
on particular algorithms, and they can often be expressed entirely in terms of well- 
known mathematical concepts. Depending on the setting and on the particular 
problem, the radii of information, the e-cardinality numbers, and the information 
Ng are related to Kolmogorov and Gelfand n-widths, e-entropy, the traces of cor- 
relation operators of conditional measures, discrepancy theory, the minimal norm 
of splines, etc. 

In summary, there arc two reasons why one can sometimes obtain sharp bounds 
on £-complexity in IBC. The first is the distinction between information and algo- 
rithm. The second is that, due to this distinction, one can draw on powerful results 
in pure and applied mathematics. 

We now respond to two central criticisms in P [92] regarding information. He 
asserts: 

(i) The information is specified (or given) and therefore this "is not complexity 
theory;" see P [92, 2. A]. 

(ii) There is an "artificial distinction between information and algorithm;" see 
P [92, 1]. 

(i) P [92] repeatedly asserts that the information is "specified" or "given." We 
have already referred to this misconception in our introduction and will amplify 
our response here. 

Varying the information and the algorithms is characteristic of IBC. (For prob- 
lems for which information is complete, i.e.. A'^ is one-to-one, only the algorithms 
can be varied.) The definition of computational complexity in our work always en- 
tails varying both information and algorithms; see, for example, TW [80, Chapter 
1, Definition 3.2], TWW [83, Chapter 5, §3], W [85, 2.5], PW [87, II], TWW [88, 
Chapter 3, §3]. 

Furthermore the study of optimal information, which of course makes sense only 

if the information is being varied, is a constant theme in our work; see, for example, 
TW [80, Chapters 2 and 7], TWW [83, Chapter 4], W [85, 3.5], PW [87, III D, V 
C], TWW [88, Chapter 4, §5.3, Chapter 6, §5.5]. 

Here, we have responded to criticism (i) in general. In §§5 and 6 we respond for 
the case of matrix computations. 

(11.1) P [92, 1] claims there is an "artificial distinction between information and 
algorithm." That is, he argues that writing the approximation U{f) = 4>{N{f)) is 
sometimes restrictive. We are surprised that he does not produce a single example 
to back his claim. 

(11.2) P [92, Abstract] states that "a sharp distinction is made between infor- 
mation and algorithms restricted to this information. Yet the information itself 
usually comes from an algorithm and so the distinction clouds the issues and can 
lead to true but misleading inferences." 

We once again explain our view of the issues involved here using a simple inte- 
gration example. 

As in §2 assume that we can compute function values. How can we approximate 
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the integral of /? The approximation U{f) can be computed by evaluating / at 
a number of points, say at xi,X2, ■ ■ ■ ,Xn, and then the computed values f{xi) are 
combined to get U{f). Computations involving f{xi), the adaptive selection of the 
points Xi, and the adaptive choice of n constitute the information N{f). Denoting 
by the mapping which combines iV(/ ), we get U{f) = 4>{N{f )). 

We do not understand why this is restrictive, why it clouds the issues, and why 
it leads to "true but misleading inferences." As explained in the first part of this 
section, the distinction between information and algorithm sometimes enables us 
to find sharp bounds on complexity. 

4. The domain F 

A basic concept in IBC is the domain F. A central criticism of IBC in P [92] 
concerns F. The assertion is that there are two difficulties with F: 

(i) There is no need for F. 

(ii) There should be a charge for knowing membership in F. 

Concerning (i), the second "high level criticism" P [92, 2.B] states: 
"The ingredient of IBCT that allows it to generate irrelevant results is the prob- 
lem class F. F does not appear in our brief description of the theory in the second 
paragraph of §1 because it is not a logically essential ingredient but rather a pa- 
rameter within IBCT." 

Concerning (ii), P [92, Abstract] states: 

"By overlooking F^s membership fee the theory sometimes distorts the economics 
of problem solving in a way reminiscent of agricultural subsidies." 
First, why is F needed? 

(1.1) The set F is necessary since it is the domain of the operator S, or part of 
the domain of the operator W. 

One need not say anything further; an operator must have a domain. Neverthe- 
less we will add a few additional points regarding the domain F. 

(1.2) For discrete or finite-dimensional problems one can sometimes take the 
"maximal" set as F. Thus, in studying the complexity of matrix multiplications 
one usually takes F as the set of all n x n matrices. In graph-theoretic complexity 
one often takes F as the set of all graphs {V, E), where V is the set of vertices and 
E is the set of edges. 

However, for infinite-dimensional problems one cannot obtain meaningful com- 
plexity results if F is too large. For example, the largest F one might take for inte- 
gration is the set of Lebesgue-integrable functions, but then comp(e) = -t-oo, V e > 
in the worst case setting. The £-complexity remains infinite even if F is the set of 
continuous functions. 

To make the complexity of an infinite-dimensional problem finite, one must take 
a smaller F in the worst case setting or switch to the average case setting. Thus, 
as we saw in §2, in the average case setting with a Wiener measure, the complexity 
is finite even if F is the set of continuous functions. 

(1.3) The use of F is not confined to IBC. In discrete computational complexity 
researchers often \ise a set F which is sinallcr than the maximal set. For example, if 
F is the set of all graphs then many problems are NP-complete. If F is a specified 
smaller set, then depending on the problem it may remain NP-complete or it may 
be solvable in polynomial time. See, for example, Garey and Johnson [79]. 
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(1.4) We believe the dependenee of complexity on F is part of the richness of IBC. 
For example, in the integration problem it is interesting to know how complexity 
depends on the number of variables and the smoothness of the integrands. 

(1.5) For a moment, we specialize our remarks to matrix computations. One could 
study the complexity of large linear systems for the set F of all invertible matrices 
of order n. Then to compute an e-approximation one would have to recover the 
matrix A by computing n matrix-vector multiplications; this is a negative result. 

We find criticism (i) particularly odd since an entire book, Parlett [80] , is devoted 
to only the eigenvalue problem for symmetric matrices. The reason is, of course, 
that the algorithms and the analysis for the symmetric eigenvalue problem are very 
different than for arbitrary matrices. But then why is the concept of F so elusive? 

Researchers in numerical linear algebra often consider other important subsets 
of matrices such as tridiagonal, Toeplitz, or Hessenberg matrices. 

We turn to the criticism that there should be a charge for knowing membership 
in F. 

(11.1) Is IBC being held to a higher standard? Do researchers in other disciplines 
charge for F7 For example, researchers in numerical analysis often analyze the cost 
and error of important algorithms. The analysis depends on F. To give a simple 
example, the analysis of the composite trapezoidal rule usually requires that the 
second derivative of the integrand is bounded. There is no charge for membership 
in F. Indeed, how would one charge for knowing that a function has a bounded 
second derivative? 

(11.2) We believe that P [92] confuses two different problems: 

(a) approximation of 5'(/ ) for / from F, 

(b) the domain membership problem; that is, does / belong to F? 

Domain membership is an interesting problem which may be formulated within 
the IBC framework, although it has nothing to do with the original problem of 
approximating S{f ) for / e F. 

We outline how this may be done. First, to make the domain membership 
problem meaningful we must define the domain of /, say the set F, in such a way 
that the logical values of / e F vary with / from F, i.e., 9 F n F ^ F. Let 
5 : F ^ {0, 1} C M be given by 

S{f)=XF{f), V/eF, 

where Xf is the characteristic (indicator) function of F. 

Then the problem is to compute S{f ) exactly or approximately. Observe that 
we now assume that f G F just as we assumed that / € F for problems of type (a). 

For the domain membership problem we charge for computing an approximation 
to S{f), and the complexity of the domain membership problem is the minimal 
cost of verifying whether f € F. 

In the worst case setting, only the exact computation of S{ f ) makes sense since 
for £ > ^ the problem is trivial, and for e < i it is the same as for e = 0. However 
for the average case or probabilistic settings, an e-approximation may be reasonable. 
For instance we may wish to compute S{f ) with probability 1 — e. 

It is easy to see that, in general, the domain membership problem cannot be 
solved in the worst case setting. To illustrate this, let F be the set of continous 
functions, and let F be the set of r times continuously differentiable functions, 
r > 1. Let the class A of information operations consist of function values. It is 
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obvious that knowing n values of /, no matter how large n may be, there is no way 
to verify whether / is a member of F. 

The domain membership problem can be studied in the average case or prob- 
abilistic settings. Its complexity may be large or small depending on F and F. 
An example of work for this problem is Gao and Wasilkowski [90] who study a 
particular domain membership problem. 

(ii.3) Finally, we are at a loss to understand the following sentence from P [92, 
2.B], "Whenever F is very large (for example, the class of continuous functions or 
the class of invertible matrices) then it is realistic to assign no cost to it." Why is 
it realistic to assign no cost for "large" F, and why is it necessary to assign cost to 
"small" F7 Where is the magic line which separates large F from small F? 

5. Large linear systems 

We briefly describe IBC research on large linear systems and then respond to 
the criticisms in P [92]. Let 

Ax = b, 

where A G F, and F is a class of n x n nonsingular matrices. Here 6 is a known 

n X 1 vector normalized such that = 1, and 1| • 1| stands for the spectral norm. 

Our problem is defined as follows. For any A & F and any = 1 compute an 
£-approximation x, 

\\Ax-b\\ <s. 

Usually A is sparse and therefore Az can be computed in time and storage 
proportional to n. It is therefore reasonable for large linear systems to assume 
that the class A of information operations consists of matrix-vector multiplications. 
That is, we can compute Azi,Az2, ■ ■ ■ ,Azk, where Zj may depend on the known 
vector b and on the previously computed vectors Azi, . . . , Azi-i. To stress that the 
right-hand side vector b is known we slightly abuse the notation of §2 and denote 

Nk[A,b) = [b,Az^,...,Azu\, AgF, (5.1) 

as the information about the problem. The number k is called the cardinality of 
information. For this to be of interest, we need k <^n. 

Krylov information is the special case when we take Zi = b and Zi = Azi-i. 
Thus Krylov information is given by 

N^^{A,b) = [b,Ab,...,A% 

In what follows we will use the concept of orthogonal invariance of the class F. 
The class F is orthogonally invariant iff 

AgF implies Q'^AQgF 

for any orthogonal matrix Q, i.e.. satisfying Q"^Q = I. 

Examples of orthogonally invariant classes include many of practical interest 
such as symmetric matrices, symmetric positive definite matrices, and matrices 
with uniformly bounded condition numbers. 

We first discuss optimal information for large linear systems which is defined as 
follows. The £-cardinality number m{e) (see §3) denotes now the minimal cardi- 
nality k of all information N). of the form (5.1) with r{Nh) < e. Obviously, m{e) 
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depends on the class F and the class A. The information is optimal iff fc = m(e) 
and r(iV*) < e. 

Remark. In §2 we define the e-complexity optimality of a pair [N, (j)). In this section 
optimality of information A^^ is introduced. How are these two optimality notions 
related? 

In general, they are not. However, as already indicated in §2, for many prob- 
lems the cost of computing N^{A^h) is proportional to cm,{e) and there exists an 
algorithm (j)* that uses and has error e and combinatory cost proportional to 
m{e). Then the pair {N^,(j)*) is (almost) £-complcxity optimal. In this case, the 
two notions of optimality coincide and the complexity analysis reduces to the prob- 
lem of finding optimal information. Details may be found in TWW [88, Chapter 4, 
§4]. □ 

In TW [84] we conjecture that for the class A of matrix-vector multiplications 
and for any orthogonally invariant F, Krylov information is optimal. 

Chou [87], based on Nemirovsky and Yudin [83], shows that Krylov information 
is optimal modulo a multiplicative factor of 2. More precisely, let m^'^{e) denote 
the minimal cardinality k of Krylov information for which r{N^'') < e. For any 
orthogonally invariant class F, we have 

m{s) < m^'is) < 2m{e) + 2. 

Recently, Nemirovsky [91] shows that for a number of important orthogonally in- 
variant classes F and for m(e) < ^(n — 3), Krylov information is optimal, 

m{s) = m^'ie). 

We now discuss algorithms that use Krylov information. We recall the definition 
of the classical minimal residual (mr) algorithm; see, e.g., Stiefel [58]. The mr 
algorithm, (j!)™'', uses Krylov information N^'^{A,b) and computes the vector Xk 
such that 

\\Axk — b\\ = m.m{\\Wk{A)b\\ : Wk is a polynomial 

of degree < k and Wk{0) = !}• 

Thus, by definition the mr algorithm minimizes the residual in the class of polyno- 
mial algorithms. 

The mr algorithm has many good properties. Let m^'"(e, 0™'') denote the min- 
imal cardinality of Krylov information needed to compute an e-approximation by 
the mr algorithm. Obviously, 'm^'^{e) denotes the minimal cardinality of Krylov 
information needed to compute an £-approximation in the class of all algorithms. 
For any orthogonally invariant class F, we have (see TW [84]) 

m^'{e) < m^'{e,(l)""') < m^'{e) + 1. 

These bounds are sharp. That is, for some F we have m^''(e) = m^''(£, (/>™'^), and 
for other F wc have m^''{e, (f>™'^) — m^''(e) + 1. 

For all practically important cases, ni^'^{e) is large and there is no significant 
difference between m^'"(e, 0™'') and m^''(e). Therefore the mr algorithm is always 
recommended as long as F is orthogonally invariant. 
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The mr algorithm minimizes, up to an additive term of 1, the number of matrix- 
vector multiplications needed to compute an £-approximation among all algorithms 
that use Krylov information in an orthogonally invariant class F. In this sense, the 
mr algorithm is Krylov- optimal, or for brevity, optimal. 

We comment on the mr algorithm. 

(1) The mr algorithm computes Xk without using the additional properties of 
A, A G F, given in the definition of the class F. This is desirable since 

the computation of Xk is the same for all F. The vector Xk can be com- 
puted by the well-known three-term recurrence formula using at most 10 kn 
arithmetic operations. 

(2) Although the mr algorithm competes with all algorithms, in particular with 
algorithms that may use the additional properties of A given in the definition 
of F, the mr algorithm can lose at most one insignificant step. Equivalently, 
one may say that for any orthogonally invariant class F , the a priori in- 
formation about the class F and the fact that A £ F is worth at most one 
step. 

(3) On the other hand, if F is not orthogonally invariant then the mr algorithm 
may lose its good properties. Example 3.5 of TW [84] provides such a class 
for which the worst happens; the mr algorithm takes n steps to solve the 
problem, whereas the optimal algorithm, which is nonpolynomial, takes only 
one step. 

For an orthogonally invariant class F and for the class A of matrix- vector mul- 
tiplications, these results yield that the pair Krylov information and mr algorithm 
is (almost) £-complexity optimal in the sense of §2. Furthermore, we have rather 
tight bounds on the worst case complexity. More precisely, 

comp(e) = cam^'ie,(t)'^'), (5.2) 

where c is the cost of one matrix-vector multiplication and 

ae [0.5 - (e, (t)""'), 1 + 10 n/c]. 

For small e and c^ n, we have roughly a £ [5,1]- 

Because of (5.2), the problem of obtaining the complexity reduces to the problem 
of finding m^'^{e, </>'"''). This number is known for some classes F; see TW [84] and 
TWW [88, Chapter 5, §9]. We discuss two classes: 

Fi = {A: A = A'^ >0, and PII2 \\A-^h < M}, 
F2 = {A: A = A'^, and ||A||2 \\A-^\2 < M}. 

That is, Fi is the class of symmetric positive definite matrices with condition num- 
bers bounded uniformly by M. Here M is a given number, M > 1. The class F2 
differs from Fi by the lack of positive definiteness. 

For these two classes, the result of Ncmirovsky [91] can be applied and for m{e) < 
^(n — 3) we have better bounds on a; namely a G [1 — l/m^'^{e, 0™'') , 1 -|- 10 n/c]. 
Thus, for small e and n, a c:^ 1. 

For the class Fi, we have 

m^'{e,(P'^') =min J n. 



ln((l + (l-£^)V^)/£) 
ln((MV2 + i)/(MV2_ 1)) 
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For small e, large M, and n > M^/^ In (2/e)/2, we have 

Kr/ j,mrN i 2 

For the class F2, we have 

m^^(e,.^'"^) =niin|n,2 
For small e, large M, and n > Mln (2/e), we have 

m^^(e,(/)"") ~Mln^. 

These formulas enable us to compare the complexities for classes Fi and F2 . For 
small £, large M, and n > 2Mln (2/e) + 3, we have 

comp(e, Fi) 1 
comp(£, F2) ~ 2Vm' 

This shows how positive definitcness decreases the £-complexity. 

P [92] has four "high level" criticisms of IBC research on the large linear systems 
problem. We also select three additional criticisms from P [92, 4]. We shall respond 
to these seven criticisms. P [92] contains other misunderstandings and errors re- 
garding this topic but we will not try the reader's patience by responding to each 
of these. We list the seven criticisms of P [92]: 

(i) IBC "is not complexity theory" since "the stubborn fact remains that re- 
stricting information to Krylov information is not part of the linear equa- 
tions problem" P [92, 2. A]. 

(ii) "The trouble with this apparent novelty is that it is not possible to evaluate 
the residual norm || — 6|| for those external z because there is no known 
matrix A (only Krylov information). So how can an algorithm that produces 
z verify whether or not it has achieved its goal of making jj A^; — 6|| < £||6||" 
P [92, 2.C]. 

(iii) "The ingredient of IBCT that allows it to generate irrelevant results is the 

problem class F [see paragraph 2 in (A)]. F did not appear in our brief 
description of the theory in the second paragraph of §1 because it is not a 
logic;ally essential ingredient but rather a parameter within IBCT;" P [92, 
2.B]. 

(iv) "IBCT's suggestion that it goes beyond the well-known polynomial class of 
algorithms is more apparent than real;" P [92, 2.C]. 

(v) "Here is a result of ours that shows why the nonpolynomial algorithms are 
of no interest in worst case complexity;" P[92, 4.3]. 

(vi) "With a realistic class such as SPD (sym, pos. def.) MR is optimal 
(strongly) as it was designed to be, and as is well known;" P [92, 4.4]. 

(vii) "The theory claims to compare algorithms restricted solely to information 
Nj. So how could the Cheb algorithm obtain the crucial parameter p?;" P 
[92, 4.4]. 



'ln((l + (l-£^)V^)/£) 
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We respond to each of these seven criticisms. 

(i) IBC does not restrict information to Krylov information. The optimahty of 
Krylov information in the class of matrix-vector multipUcations is a conclusion, not 
an assumption. 

IBC does assume a class A of information operations. The reasons why this is 
both necessary and beneficial were discussed in §2. Here we confine ourselves to 
certain classes relevant to large linear systems. 

Let Ai denote the class of matrix-vector multiplications. Then as described 
above, for an orthogonally invariant class F we may conclude that Krylov informa- 
tion is optimal to within a multiplicative factor of at most 2. Furthemore, we may 
conclude that Krylov information and the mr algorithm are almost e-complexity 
optimal. Rather tight bounds have been obtained on the complexity of important 
classes such as Fi and F2, see above. Additional classes of matrices are studied in 
TW [84]. 

Let A2 denote the class of information operations where inner products of rows 
(or columns) of A and an arbitrary vector z can be computed. Rabin [72] studied 
the class A2 for the exact solution of linear systems, e = 0, and for an arbitrary 
nonsingular matrix A. He proved that, roughly, ^n^ inner products are sufficient 
to solve the problem. No results are known for e > 0. 

Let A3 denote the class of information operations consisting of arbitrary linear 
functional. Optimality questions for the class A3 are posed in TW [84]. No results 
are known and we believe this to be a difficult problem. 

Let A4 denote the class of information operations consisting of continuous non- 
linear functional, and let A5 denote the class of nonlinear functionals. In general, 
complexity results in A4 and A5 can be different; see Kacewicz and Wasilkowski 
[86] and Mathe [90]. For linear systems, these classes are too powerful since all 
entries of the matrix A can be recovered by knowing the value of one continuous 
nonlinear functional. Thus, the e-cardinality number is 1 even for e = 0; see TW 
[80, Chapter 7, §3] for related material. 

(ii) If the class A consists of matrix-vector multiplications then, of course, we 
can evaluate the residual \\Az — b\\ for any z. If z is outside of a Krylov subspace 
this requires one additional matrix-vector multiplication. 

On the other hand, it is sometimes possible to guarantee that \\Az — b\\ < e, 
without computing the residual \\Az — b\\. This can be done by using a priori 
information that A G F and the computed Krylov information. An example of 
such a situation is provided by the Chebyshev algorithm for the class F = {A = 
I^B: B = B'^, \\B\\ <p<l}. 

In general, if the assumptions are satisfied, IBC is predictive. The results of 
the theory guarantee an e-approximation. One simply does the amount of work 
specified by the upper bound on the complexity. For important cla.Hsc^s of matrices 
we have seen above that there are rather tight bounds on the complexity. Therefore 
this strategy does not require much more work than necessary. 

For most problems there is no residual that can be checked. There are residuals 
for problems related to solving linear or nonlinear equations. In the multivariate 
integration example of §2, there is no residual that can be computed. Yet, IBC 
guarantees an e-approximation by using a priori information about the class F . 

(iii) We responded in general to the criticism that F is not needed in §4; here 
we focus on large linear systems. On this problem P [92, 2.B] states that "IBCT 
seems to use F as a tuning parameter designed to keep k < n." 
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The domain F is not a tuning parameter; it is needed for the problem to be well 
defined. The domain F contains all a priori knowledge about matrices A. The more 
we know a priori, the smaller the domain F becomes, and as F becomes smaller, 
the problem becomes easier. Furthermore, a priori information is often available in 
practice. For example, matrices which occur in the approximation of elliptic partial 
differential operators are symmetric positive definite, often with known bounds on 
condition numbers. 

Fortunately, many important classes which occur in practice are orthogonally 
invariant and the s-complcxity optimality of Krylov information and the mr algo- 
rithm may be applied. 

Of course, numerical analysts use different algorithms for different classes of 
matrices (symmetric, positive definite, tridiagonal, Tocplitz, etc.) It is therefore all 
the more surprising that P [92] objects to the concept of the class F. 

(iv) P [92, 2C] claims that there is no need to go "beyond the well-known polyno- 
mial class of algorithms." It should be obvious that all algorithms must be allowed 
to compete if wc want to establish lower bounds on complexity. 

For orthogonally invariant classes it turns out that the restriction to the polyno- 
mial class of algorithms does not cause any harm since the classical mr algorithm 
may lose at most one insignificant step. But this had to be proven! 

In fact, it is not uncommon in computational complexity that the known algo- 
rithms (that use the specific information) turn out to be optimal or close to optimal. 
Examples include the Horner algorithm for evaluating a polynomial, the finite ele- 
ment method with appropriate parameters for elliptic partial differential equations, 
or the bisection algorithm for approximating a zero of a continuous function that 
changes sign at the interval endpoints. 

For large linear systems, a sufficient condition for almost £-complexity optimality 
of Krylov information and the mr algorithm is orthogonal invariance of the class F. 
As mentioned above, Example 3.5 of TW [84] shows that if F is not orthogonally 
invariant, the mr algorithm may lose its optimality. In this example the restriction 
to the polynomial class of algorithms is harmful because the optimal algorithm is 
nonpolynomial . 

(v) P [92, 4.3] supports his claim that nonpolynomial algorithms are not inter- 
esting by the Theorem of §4.3. This theorem holds for the class of SPD of all n x n 
symmetric positive definite matrices. In this theorem it is shown that for every 
nonpolynomial algorithm which computes an approximation outside the Krylov 
subspace for A G SPD, there exists a matrix from SPD which has the identical 
Krylov information as A and for which the residual is arbitrarily large. 

We do not understand why the Theorem of §4.3 and the one page sketch of its 
proof were supplied. The same statement can be found in Example 3.4 of TW [84]. 
In addition, Example 3.4 shows that polynomial algorithms arc also not good for 
the class SPD; that is, n matrix-vector multiplications are needed to compute an 
e-approximation. The reason neither polynomial nor nonpolynomial algorithms are 
good is that the class SPD is too large. 

We stress that Example 3.4 and the Theorem of §4.3 hold for F =SPD. As men- 
tioned above, for any orthogonally invariant class F the nonpolynomial algorithms 
arc not of interest since it has been proven that the mr algorithm is optimal, pos- 
sibly modulo one matrix- vector multiplication. Also, as mentioned above, if F is 
not orthogonally invariant, a nonpolynomial algorithm may be optimal. 

(vi) P [92] claims that the mr algorithm is optimal "as it was designed to be" for 
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the class SPD. This is simply not true. The mr algorithm is defined to be; optimal 
in the class of polynomial algorithms. Optimality of the mr algorithm in the class 
of all algorithms for the class SPD requires a proof. 

(vii) As already explained, the information that A E F = {A = I — B : B = 
B^, < p < 1} is not used by the mr algorithm. This means that the mr 
algorithm does not use the parameter p which is assumed known a priori and may 
be used by competing algorithms. The parameter p is used by the Chebyshev 
algorithm and that is why the mr algorithm loses one step for the class F. P [92, 
4.4] turns the positive optimality result for the mr algorithm into the irrelevant 
question "how could the Chebyshev algorithm obtain the crucial parameter p?" By 
the way, the parameter p is not so crucial if it decreases the number of steps by 
only one! 

6. Large eigenvalue problem 

P [92] has three "high level" criticisms of the IBC research on the large eigenpair 
problem. He also criticizes the numerical testing. We shall respond to these four 
criticisms. 

We list the four criticisms of P [92] : 

(i) Kuczyhski [86] computes an unspecified eigenvalue; P [92, 2.D]. 

(ii) IBC "is not complexity theory." The reason given is that "the stiibborn 
fact remains that restricting information to Krylov information is not part 
... of the eigenvalue problem;" P [92, 2. A]. 

(iii) "The fact that b is treated as prescribed data is quite difficult to spot;" P 
[92, 2.E]. 

(iv) "The author has worked exclusively with tridiagonal matrices and has for- 
gotten that the goal of the Lanczos recurrence is to produce a tridiagonal 
matrix! Given such a matrix one has no need of either Lanczos or GMRf 
P [92, 5.5]. 

We respond to each of these four criticisms. 

(i) P [92] is certainly correct in asserting that when only one or a few eigenvalues 
of a symmetric matrix are sought, then one typically desires a preassigned eigenvalue 
or a few preassigned eigenvalues. To be specific, assume that the largest eigenvalue 
is to be approximated. 

It would be desirable to always guarantee that the largest eigenvalue Ai(^) of 
a large symmetric matrix A can be computed to within error e. Unfortunately, 
this cannot be done with less than n matrix-vector multiplications, that is, without 
recovering the matrix A; see TWW [88, Chapter 5, §10]. More precisely, let F 
denote the class of all n x n symmetric matrices and let A consist of matrix- vector 
multiplications. That is, N[A) = [Azi, . . . , Azk], where zi is an arbitrary vector 
and Zi for i > 2 may depend arbitrarily on Azi, . . . , Azi-\. Then for fc < n — 1, 
there exists no such A'' and no algorithm (j) which uses N such that U{A) = ^{N{A)) 
satisfies 

\XiiA)-UiA)\<e\\A\\, yAGF 

We are surprised that although TWW [88] is cited in P [92], he does not seem to 
be aware of this result. 

Thus, the goal of computing an e-approximation to the largest eigenvalue of a 
large symmetric matrix cannot be achieved, if less than n matrix-vector multipli- 
cations are used. This is, of course, a worst case result. There are a number of 
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options for coping with this negative result. One could stay with the worst case 
setting but settle for an unspecified eigenvalue. Or one could give up on the worst 
case guarantee and settle for a weaker one. We consider these options in turn. 

(1.1) One option is to settle for an unspecified eigenvalue. More precisely, the 
problem studied by Kuczyriski [86] and Chou [87] is defined as follows. For A £ F, 
compute {x, A) with x G M", = 1, and A € M, such that 

\\Ax-Xx\\ <e\\A\\. 

Chou proved, modulo a multiplicative factor of 2, optimality of Krylov infor- 
mation N{A) = [Ab, . . . ,A''b], where 6 is a nonzero vector. Optimality of Krylov 
information holds independently of the choice of the vector b. Kuczynski proved, 
modulo an additive term of 2, optimality of the generalized minimal residual (gmr) 
algorithm that uses Krylov information. (Optimality of Krylov information and 
the gmr algorithm is understood as in §5. These optimality results hold for any 
orthogonally invariant class of matrices. ) 

Since the gmr algorithm has small combinatory cost, we conclude that the pair 
Krylov information and gmr algorithm is (almost) e-complexity optimal. Kuczynski 
found good bounds on the worst case error of the gmr algorithm. Hence, for n > 
e~^, the worst case e-complexity is given by 

ac 

comp(£) = — , 

where a roughly belongs to [j, 1] and, as before, c is the cost of one matrix- vector 
multiplication. 

(1.2) A second option is to attempt to approximate the largest eigenvalue but 
to settle for a weaker guarantee. KW [89]"* study this problem in the randomized 
setting. (See, e.g., TWW [88, Chapter 11] for a general discussion of the randomized 
setting.) 

In particular, the Lanczos algorithm is studied. The Lanczos algorithm uses 
Krylov information N{A) = [Ab, A'^b, . . . , A'^b] with a random vector b which is 
uniformly distributed over the unit sphere of M". The error is defined for a fixed 
matrix A while taking the average with respect to the vectors b. 

To date only an upper bound on the error of the Lanczos algorithm with ran- 
domized Krylov information has been obtained. This upper bound is proportional 
to ((lnn)/fc)2. 

As always, to obtain complexity results both the information and the algorithm 
must be varied. Lower bounds arc of particular interest. The complexity of ap- 
proximating the largest eigenvalue in the randomized setting is open. 

(ii) P [92, 2. A] states ". . . the stubborn fact remains that restricting information 
to Krylov information is not part ... of the cigcnvahie problem." 

Although we have mentioned several times in this paper that P [92] seems un- 
aware of the results regarding optimality of Krylov information we are particularly 
surprised that he appears unaware of this result in the context of the large eigenvalue 
problem. P [92] repeatedly cites Kuczynski [86] where Chou's result is reported. 

(iii) P [92, 2.E] states "the fact that b is treated as prescribed data is quite difficult 
to spot." Perhaps the reason it is difficult to spot is that it is not prescribed. 
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What is assumed known? It is known a priori that ^ is a symmetric n x n 
matrix. Furthermore, we are permitted to compute Azi, . . . , Azk, where Zi may be 
adaptively chosen. We are permitted to choose zi , which is called b, arbitrarily. In 
choosing b we cannot assume that A is known, since the raison d'etre of methods 
for solving large eigenvalue problems is just that A need not be known. 

By the result quoted in (i), it is impossible to guarantee that we can find a vector 
b such that an e-approximation to the largest eigenvalue can be computed for all 

symmetric n x n matrices with k < n. 

If Krylov information Ab, A^b, . . . , A'^b is used then the situation is even worse. 
Even for arbitrary k, i.e., even for > n, an £-approximation cannot be computed. 
Indeed, suppose wc choose a vector b and a matrix A such that Ab = b. Then 
Krylov information is reduced just to the vector b. The largest eigenvalue cannot 
be recovered (unless n = 1). Thus, for any vector b there are symmetric matrices 
A for which Krylov information will not work. 

Of course, one can choose b randomly, as was discussed above. The average 
behavior with respect to vectors b is satisfactory for all symmetric matrices. But 
then one is settling for a weaker guarantee of solving the problem. 

P [92, 2.E] claims that for Krylov information "satisfactory starting vectors are 
easy to obtain." This remark seems to confuse the worst case and randomized 
settings. To get a satisfactory starting vector b in the worst case setting, the 
vector b must be chosen using some additional information about the matrix A. If 
such information is not available, it is impossible to guarantee satisfactory starting 
vectors. On the other hand, in the randomized setting it is indeed easy to get 
satisfactory starting vectors. 

(iv) P [92, 5.5] complains that Kuczyhski [86] tests only tridiagonal matrices. 

There is no loss of generality in restricting the convergence tests of the Lanczos 
or gmr algorithms to tridiagonal matrices. That was done in Kuczyhski [86] to 
speed up his tests. What is claimed in Kuczyhski [86] for the pairs (Ti?/, 5), 
TRI a tridiagonal matrix and 6 = ei = [1, 0, . . . , 0]''^, is also true for the pairs 
{Q^ TRI Q , Q^b) for any orthogonal matrix Q. Obviously, the matrix Q^TRIQ 
is not, in general, tridiagonal. 

The confusion between the worst case and randomized settings is also apparent 
when P [92] discusses numerical tests performed by Kuczyhski [86] and by him. 

For the unspecified eigenvalue problem, Kuczyhski [86] compares the gmr and 
Lanczos algorithms in the worst case setting. These two algorithms cost essentially 
the same per step, and the gmr algorithm never requires more steps than the 
Lanczos algorithm. For some matrices, the gmr algorithm uses substantially fewer 
steps than the Lanczos algorithm. That is why in the worst case setting the gmr 
algorithm is preferable. 

P [92] performed his numerical tests for the Lanczos algorithm with random 
starting vectors b. Thus, he uses a different setting. It is meaningless to compare 
numerical results in different settings. 

Finally, extensive numerical testing is also reported in KW [89] for approximating 
the largest eigenvalue by the Lanczos algorithm with randomized starting vec;tors. 
The Lanczos algorithm worked quite well for all matrices tested. The numerical 
tests reported by P [92] and KW [89] show the efficiency of the Lanczos algorithm 
in the randomized setting. 
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7. Refinements of IBC 

Our response to the criticism in P [92] does not mean that the current model 
assumptions of IBC are the only ones possible. On the contrary, we believe that in 
some circumstances these assumptions should be refined to improve the modelling 
of computational problems. We have mentioned the desirability of such refinements 
in, e.g., TWW [88, Chapter 3, §2.3] and W [85, §9]. In this section we will very 
briefly indicate some of the possible refinements and extensions of IBC, and indicate 
partial progress. This is preparatory to responding to several comments in P [92]. 

Refinements and extensions of IBC include the following: 

(1) We usually assume the real number model in a sequential model of com- 
putation where the cost of a combinatory operation is independent of the 
precision of the operands or of the result. Also of interest is a model where 
the cost of a combinatory operation depends on the precision (bit model) 
and/or on the particular operation. Parallel and distributed models of com- 
putation should also be studied. For examples of work in these directions 
see Bojanczyk [84] who studies the approximate solution of linear systems 
using a variable precision parallel model of computation, and Kacewicz [90] 
who studies initial value problems for both sequential and parallel models 
of computation. 

(2) We usually assume that for every information operation L e A and for 
every f £ F the computation of L{f) costs c, c > 0. Also of interest is 
a model where the cost of an information operation depends on L, /, and 
precision. For an example, see Kacewicz and Plaskota [90] who study linear 
problems in a model where the cost of information operations varies with 
the computed precision. 

(3) Let 5 be a linear operator. Then we often assume that the set F is balanced 
and convex; TWW [88, Chapter 4, §5]. In particular, for functions spaces, 
we often assume that F is a Sobolev space of smoothness r with a uniform 
bound on jj/ '''^ jj. It is of interest to study F which do not have such a nice 
structure. 

P [92, 1] states "a handful of reservations about IBC have appeared in print." 
These "reservations" turn out to concern refinements of IBC. P [92] writes that 
Babuska [87] calls for realistic models. For example, Babuska points out that for 
some problems arising in practice the set F does not consist of smooth functions 
but rather of functions which are piecewise smooth with singularities at unknown 
points. We agree that this is an important problem. A promising start has been 
made by Wasilkowski and Gao [89] on estimating a singularity of a piecewise smooth 
function in a probabilistic setting. 

Babuska observes that the user may not know the class F or not know F exactly, 
and suggests the importance of algorithms which enjoy optimality properties for a 
number of classes. We agree that this is an important concern and a good direction 
for future research. See W [85, §9.3] where this problem is called the "fat" F 
problem and where partial results are discussed. One attack on this problem is to 
address the domain membership problem defined in §4. As indicated there, this 
can only be done with a stochastic assurance. 

P [92, 1] asserts that in a review of TWW [83], Shub [87] "gives a couple of 
instances of unnatural measures of cost." (These words are from P [92], not from 
Shub [87].) Shub, in a generally favorable review (the reader may want to verify 
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this), suggests circumstances when the cost of an information operation should vary. 
We concur. 

8. Summary 

P [92, 2] states five high level criticisms of IBC. We responded to them in the 
following sections: 

Criticism Response 
A 1, 3, 5, 6 
B 4, 5, 6 

C 5 
D 6 
E 6 

There are additional criticisms, and in §§5 and 6 we responded to the ones which 
seem most important. 

P [92, 1] states that "a handful of reservations about IBCT have appeared in 
print." He neglects mentioning the many favorable reviews. He cites two examples 
of reservations. We discussed the comments of Babuska [87] and Shub [87] in §7. 

P [92] is based upon the following syllogism: 

(1) Major Premise: If two specific papers of IBC are misleading, then IBC 
is flawed. 

(2) Minor Premise: Two specific papers of IBC regarding matrix computa- 
tions are misleading. 

(3) Conclusion: IBC is flawed. 

We have shown that his reasons for believing the minor premise are mistaken. 
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